ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions

نویسندگان

  • Olga Uryupina
  • Ron Artstein
  • Antonella Bristot
  • Federica Cavicchio
  • Kepa Joseba Rodríguez
  • Massimo Poesio
چکیده

This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena. Building upon the first release almost a decade ago, a considerable effort had been invested in improving the data both quantitatively and qualitatively. Thus, we have doubled the corpus size, expanded the selection of covered phenomena to include referentiality and genericity and designed and implemented a methodology for enforcing the consistency of the manual annotation. We believe that the new release of ARRAU provides a valuable material for ongoing research in complex cases of coreference as well as for a variety of related tasks. The corpus is publicly available through LDC.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Anaphoric Annotation in the ARRAU Corpus

Arrau is a new corpus annotated for anaphoric relations, with information about agreement and explicit representation of multiple antecedents for ambiguous anaphoric expressions and discourse antecedents for expressions which refer to abstract entities such as events, actions and plans. The corpus contains texts from different genres: task-oriented dialogues from the Trains-91 and Trains-93 cor...

متن کامل

Processing definite descriptions in corpora

We discuss in this paper a system that resolves definite descriptions in written texts. A preliminary study of definite descriptions in a collection of 20 texts revealed that about 30% of the 1040 definites in the collection were cases of anaphoric definites whose antecedents had the same head noun, and 50% introduced novel discourse referents. An algorithm which resolves anaphoric definite des...

متن کامل

Performance and limitations of the linguistically motivated Cocoa/Peaberry system in a broad biological domain

We tested a linguistically motivated rulebased system in the Cancer Genetics task of the BioNLP13 shared task challenge. The performance of the system was very moderate, ranging from 52% against the development set to 45% against the test set. Interestingly, the performance of the system did not change appreciably when using only entities tagged by the inbuilt tagger as compared to performance ...

متن کامل

Textual co-reference annotation: a study on definite descriptions

In the linguistic literature many different uses of definite descriptions are acknowledged and explained (Fraurud 1990, Hawkins 1978, Löbner 1985, Prince 1992). These authors give us taxonomies of the different uses of the definite article. Based on these previous works we ran two experiments in annotating definite description uses whose goals were: 1. to observe the distribution of the differe...

متن کامل

Ontology Learning and Semantic Annotation: a Necessary Symbiosis

Semantic annotation of text requires the dynamic merging of linguistically structured information and a “world model”, usually represented as a domain-specific ontology. On the other hand, the process of engineering a domain ontology through semi-automatic ontology learning system requires the availability of a considerable amount of semantically annotated documents. Facing this bootstrapping p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016